Y

YouLibs

Remove Touch Overlay

Working with Data in a Connected World - Clair J. Sullivan | PyData Global 2021

Duration: 01:26:20Views: 169Likes: 2Date Created: Jan, 2022

Channel: PyData

Category: Science & Technology

Tags: pythonlearn to codeeducationsoftwarepydatalearncodinghow to programjuliaopensourcescientific programmingnumfocuspython 3tutorial

Description: Working with Data in a Connected World: the Power of Graph Data Science Speaker: Clair J. Sullivan Summary Data science and machine learning have traditionally revolved around creating models based on the assumption that individual data points are uncorrelated. However, this ignores a signal that could potentially be very strong: the relationships between data points. We will look at this data as a network graph, and explore how to unlock the potential using a graph database. Description This hands-on tutorial will begin with a discussion comparing querying data in a tabular environment such as SQL or Pandas dataframes. We will show hints of how to use that data to identify whether your problem would be better expressed as a graph problem. From there, we will provide a brief introduction to the graph theory concepts that are most relevant to data scientists such as centrality algorithms (ex: PageRank), community detection algorithms, node similarity, and path finding. Next we will discuss some standard Python packages used for graph analytics, which will be used as motivation for working with graph databases based on significant improvements to scalability and simple querying. We will then create our own free graph database using the Sandbox of Neo4j to do some hands-on data science. Using our database, we will demonstrate how to use standard Python packages for populating the graph and querying the data within it. This will include a brief introduction to the Cypher query language, commonly used for analyzing graphs, and why this approach is much more efficient than using a traditional relational databases or in-memory graph analytics in Python. There will also be an introduction on how to visualize graphs within the browser. We will conclude with how to create a machine learning model from a graph, based on the calculation of graph embeddings, to perform a common task such as node classification. Clair J. Sullivan's Bio Dr. Clair Sullivan is currently a graph data science advocate at Neo4j, working to expand the community of data scientists and machine learning engineers using graphs to solve challenging problems. She received her doctorate degree in nuclear engineering from the University of Michigan in 2002. After that, she began her career in nuclear emergency response at Los Alamos National Laboratory where her research involved signal processing of spectroscopic data. She spent 4 years working in the federal government on related subjects and returned to academic research in 2012 as an assistant professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illinois at Urbana-Champaign. While there, her research focused on using machine learning to analyze the data from large sensor networks. Deciding to focus more on machine learning, she accepted a job at GitHub as a machine learning engineer while maintaining adjunct assistant professor status at the University of Illinois. In 2021 she joined Neo4j as a Graph Data Science Advocate. Additionally, she founded a company, La Neige Analytics, whose purpose is to provide data science expertise to the ski industry. She has authored 4 book chapters, over 20 peer-reviewed papers, and more than 30 conference papers. Dr. Sullivan was the recipient of the DARPA Young Faculty Award in 2014 and the American Nuclear Society's Mary J. Oestmann Professional Women's Achievement Award in 2015. GitHub: github.com/cj2001 Twitter: twitter.com/CJLovesData1 LinkedIn: linkedin.com/in/dr-clair-sullivan-09914342 PyData Global 2021 Website: pydata.org/global2021 LinkedIn: linkedin.com/company/pydata-global Twitter: twitter.com/PyData pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: github.com/numfocus/YouTubeVideoTimestamps

Swipe Gestures On Overlay